MLSLib: A Lip Sync Library for Multi Agents and Languages
نویسندگان
چکیده
This article presents MLSLib, a software library for human gure animation with lip syncing. The library enables us to easily use multiple TTS systems and multiple lip motion generators, and switch them arbitrarily. It also helps use of multiple speaking agents, possibly with di erent TTS systems and lip motion generators. The MLSLib is composed of three modules: LSSAgent, TTSManager, and FCPManager; The LSSAgent module provides uni ed simple APIs per single agent, independent of TTS systems and lip motion generators. The TTSManager and FCPManager manage TTS systems and lip motion generators, respectively. Both modules support standard sets of phonetic alphabets per language, and thus users are freed from TTS-dependent implementation of lip motion generators. Applications to multi-lingual agents and LOD in lip syncing are also presented.
منابع مشابه
Animating Lip-Sync Characters
Speech animation is traditionally considered as important but tedious work for most applications, especially when taking lip synchronization (lip-sync) into consideration, because the muscles on the face are complex and interact dynamically. Although there are several methods proposed to ease the burden on artists to create facial and speech animation, almost none are fast and efficient. In thi...
متن کاملMethod for Custom Facial Animation and Lip-Sync in an Unsupported Environment, Second LifeTM
The virtual world of Second LifeTM does not offer support for complex facial animations, such as those needed for an intelligent virtual agent to lip sync to audio clips. However, it is possible to access a limited range of default facial animations through the native scripting language, LSL. Our solution to produce lip sync in this environment is to rapidly trigger and stop these default anima...
متن کاملDetecting audio-visual synchrony using deep neural networks
In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof detection in biometrics, lip-syncing, speaker detection and diarization in multi-subject videos, and video data quality assurance. In our adopted approach, we investig...
متن کاملObamaNet: Photo-realistic lip-sync from text
We present ObamaNet, the first architecture that takes any text as input and generates both the corresponding speech and synchronized photo-realistic lip-sync videos. Contrary to other published lip-sync approaches, ours is only composed of fully trainable neural modules and does not rely on any traditional computer graphics methods. More precisely, we use three main modules: a text-to-speech n...
متن کاملAutomated Gesturing for Embodied Animated Agent: Speech-driven and Text-driven Approaches
We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic Lip Sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the app...
متن کامل